76 research outputs found

    Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure

    Get PDF
    Abstract: In recent years, there has been an increasing interest in music generation using machine learning techniques typically used for classification or regression tasks. This is a field still in its infancy, and most attempts are still characterized by the imposition of many restrictions to the music composition process in order to favor the creation of “interesting” outputs. Furthermore, and most importantly, none of the past attempts has focused on developing objective measures to evaluate the music composed, which would allow to evaluate the pieces composed against a predetermined standard as well as permitting to fine-tune models for better “performance” and music composition goals. In this work, we intend to advance state-of-the-art in this area by introducing and evaluating a new metric for an objective assessment of the quality of the generated pieces. We will use this measure to evaluate the outputs of a truly generative model based on Variational Autoencoders that we apply here to automated music composition. Using our metric, we demonstrate that our model can generate music pieces that follow general stylistic characteristics of a given composer or musical genre. Additionally, we use this measure to investigate the impact of various parameters and model architectures on the compositional process and output

    Audio Barlow twins: self-supervised audio representation learning

    Get PDF
    The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021 Challenge, achieving results which outperform, or otherwise are on a par with, the current state-of-the-art for instance discrimination self-supervised learning approaches to audio representation learning. Code at https://github.com/jonahanton/SSL_audio

    Deep Learning for Mobile Mental Health: Challenges and recent advances

    Get PDF
    Mental health plays a key role in everyone’s day-to-day lives, impacting our thoughts, behaviours, and emotions. Also, over the past years, given its ubiquitous and affordable characteristics, the use of smartphones and wearable devices has grown rapidly and provided support within all aspects of mental health research and care, spanning from screening and diagnosis to treatment and monitoring, and attained significant progress to improve remote mental health interventions. While there are still many challenges to be tackled in this emerging cross-discipline research field, such as data scarcity, lack of personalisation, and privacy concerns, it is of primary importance that innovative signal processing and deep learning techniques are exploited. Particularly, recent advances in deep learning can help provide the key enabling technology for the development of the next-generation user-centric mobile mental health applications. In this article, we first brief basic principles associated with mobile device-based mental health analysis, review the main system components, and highlight conventional technologies involved. Next, we describe several major challenges and various deep learning technologies that have potentials for a strong contribution in dealing with these challenges, respectively. Finally, we discuss other remaining problems which need to be addressed via research collaboration across multiple disciplines.This paper has been partially funded by the Bavarian Ministry of Science and Arts as part of the Bavarian Research Association ForDigitHealth, the National Natural Science Foundation of China (Grant No. 62071330, 61702370), and the Key Program of the National Natural Science Foundation of China (Grant No: 61831022)

    Speech Emotion Recognition Considering Local Dynamic Features

    Full text link
    Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.Comment: 10 pages, 3 figures, accepted by ISSP 201

    SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild

    Get PDF
    Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are becoming indispensable part of our life more and more. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation

    Speech Processing and Prosody

    Get PDF
    International audienceThe prosody of the speech signal conveys information over the linguistic content of the message: prosody structures the utterance, and also brings information on speaker's attitude and speaker's emotion. Duration of sounds, energy and fundamental frequency are the prosodic features. However their automatic computation and usage are not obvious. Sound duration features are usually extracted from speech recognition results or from a force speech-text alignment. Although the resulting segmentation is usually acceptable on clean native speech data, performance degrades on noisy or not non-native speech. Many algorithms have been developed for computing the fundamental frequency, they lead to rather good performance on clean speech, but again, performance degrades in noisy conditions. However, in some applications, as for example in computer assisted language learning, the relevance of the prosodic features is critical; indeed, the quality of the diagnostic on the learner's pronunciation will heavily depend on the precision and reliability of the estimated prosodic parameters. The paper considers the computation of prosodic features, shows the limitations of automatic approaches, and discusses the problem of computing confidence measures on such features. Then the paper discusses the role of prosodic features and how they can be handled for automatic processing in some tasks such as the detection of discourse particles, the characterization of emotions, the classification of sentence modalities, as well as in computer assisted language learning and in expressive speech synthesis

    Anthrax Lethal Toxin Disrupts Intestinal Barrier Function and Causes Systemic Infections with Enteric Bacteria

    Get PDF
    A variety of intestinal pathogens have virulence factors that target mitogen activated protein kinase (MAPK) signaling pathways, including Bacillus anthracis. Anthrax lethal toxin (LT) has specific proteolytic activity against the upstream regulators of MAPKs, the MAPK kinases (MKKs). Using a murine model of intoxication, we show that LT causes the dose-dependent disruption of intestinal epithelial integrity, characterized by mucosal erosion, ulceration, and bleeding. This pathology correlates with an LT-dependent blockade of intestinal crypt cell proliferation, accompanied by marked apoptosis in the villus tips. C57BL/6J mice treated with intravenous LT nearly uniformly develop systemic infections with commensal enteric organisms within 72 hours of administration. LT-dependent intestinal pathology depends upon its proteolytic activity and is partially attenuated by co-administration of broad spectrum antibiotics, indicating that it is both a cause and an effect of infection. These findings indicate that targeting of MAPK signaling pathways by anthrax LT compromises the structural integrity of the mucosal layer, serving to undermine the effectiveness of the intestinal barrier. Combined with the well-described immunosuppressive effects of LT, this disruption of the intestinal barrier provides a potential mechanism for host invasion via the enteric route, a common portal of entry during the natural infection cycle of Bacillus anthracis

    Identification of Ligand Binding Sites of Proteins Using the Gaussian Network Model

    Get PDF
    The nonlocal nature of the protein-ligand binding problem is investigated via the Gaussian Network Model with which the residues lying along interaction pathways in a protein and the residues at the binding site are predicted. The predictions of the binding site residues are verified by using several benchmark systems where the topology of the unbound protein and the bound protein-ligand complex are known. Predictions are made on the unbound protein. Agreement of results with the bound complexes indicates that the information for binding resides in the unbound protein. Cliques that consist of three or more residues that are far apart along the primary structure but are in contact in the folded structure are shown to be important determinants of the binding problem. Comparison with known structures shows that the predictive capability of the method is significant

    Functional roles of fibroblast growth factor receptors (FGFRs) signaling in human cancers

    Full text link
    corecore